Object Permanence Through Audio-Visual Representations
نویسندگان
چکیده
As robots perform manipulation tasks and interact with objects, it is probable that they accidentally drop objects (e.g., due to an inadequate grasp of unfamiliar object) subsequently bounce out their visual fields. To enable recover from such errors, we draw upon the concept object permanence-objects remain in existence even when are not being sensed seen) directly. In particular, developed a multimodal neural network model-using partial, observed trajectory audio resulting impact as its inputs-to predict full end location dropped object. We empirically show that: 1) our method predicted locations close proximity (i.e., within field robot's wrist camera) actual 2) robot was able retrieve by applying minimal vision-based pick-up adjustments. Additionally, outperformed five comparison baselines retrieving objects. Our results contribute enabling permanence for error recovery drops.
منابع مشابه
Cortical Plasticity of Audio–Visual Object Representations
Several regions in human temporal and frontal cortex are known to integrate visual and auditory object features. The processing of audio-visual (AV) associations in these regions has been found to be modulated by object familiarity. The aim of the present study was to explore training-induced plasticity in human cortical AV integration. We used functional magnetic resonance imaging to analyze t...
متن کاملObject Category Detection Using Audio-Visual Cues
Categorization is one of the fundamental building blocks of cognitive systems. Object categorization has traditionally been addressed in the vision domain, even though cognitive agents are intrinsically multimodal. Indeed, biological systems combine several modalities in order to achieve robust categorization. In this paper we propose a multimodal approach to object category detection, using au...
متن کاملAudio-Visual Object Extraction using Graph Cuts
We propose a novel method to automatically extract the audio-visual objects that are present in a scene. First, the synchrony between related events in audio and video channels is exploited to identify the possible locations of the sound sources. Video regions presenting a high coherence with the soundtrack are automatically labelled as being part of the audio-visual object. Next, a graph cut s...
متن کاملScene Understanding through Audio-Visual Fusion
Scene understanding involves the integration of a wide variety of information to produce a through description of the robot's environment. By integrating spatial, visual and audio cues, we could provide a greater amount of understanding than can be obtained using one of the modalities alone. In this paper, we describe our current work on using audition to enhance existing object detection and t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Access
سال: 2021
ISSN: ['2169-3536']
DOI: https://doi.org/10.1109/access.2021.3115082